Extracting Match Stats From Halo Infinite Film Files

Table of Contents

This blog post is part of a series on exploring the Halo game API.

Introduction #

One of the conversations on my blog comments led to a discussion about film files in Halo Infinite. In case you are not familiar with them, no worries - it’s a pretty obscure component of the match data that I haven’t gone in-depth yet on my blog or here, on the OpenSpartan blog.

The idea behind film files is simple - they aren’t your traditional video but rather a combination of game engine metadata that is captured during your gameplay. When you complete a match, a “film” (a recording of all match metadata) is captured and you end up with a whole bunch of binary content that is available through a dedicated API endpoint.

Before we go down this rabbit hole, I want to give a massive shout-out to Andy Curtis for doing quite a bit of work digging through film file structure 🙌

Finding the film files #

Before we get to the film content, let’s figure out how we find them. To get started, first try to get your own matches from the Halo Infinite API. This will allow you get the match IDs that we can later use to query for film data. You can send a request to this endpoint to get the most recent matches:

https://halostats.svc.halowaypoint.com/hi/players/xuid({{XUID}})/matches?count=25

In the example above, {{XUID}} is the numeric identifier of your player ID. I talked about the process of converting a gamertag into a XUID in a separate blog post.

You will need to make sure that you authenticate for the API call above to succeed (and all other API calls in this blog post). You can learn more about this in Halo Infinite Web API Authentication.

The match data you will get will be by default in JSON format, like this:

{
    "Start": 0,
    "Count": 25,
    "ResultCount": 25,
    "Results": [
        {
            "MatchId": "4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4",
            "MatchInfo": {
                "StartTime": "2024-09-16T02:35:15.505Z",
                "EndTime": "2024-09-16T02:42:08.144Z",
                "Duration": "PT6M31.0705518S",
                "LifecycleMode": 3,
                "GameVariantCategory": 9,
                "LevelId": "1216247c-bf6d-4740-8270-e800a114f231",
                "MapVariant": {
                    "AssetKind": 2,
                    "AssetId": "37a9b5f0-6be7-4a46-8010-1fe6f7ea5611",
                    "VersionId": "e1cbf812-4f4e-44fc-9ef8-dd9ab5c4e4cf"
                },
                "UgcGameVariant": {
                    "AssetKind": 6,
                    "AssetId": "0e198591-ac15-4f99-8ff2-dd390decad66",
                    "VersionId": "168e6c3a-fdf3-4edd-af79-c0ffe5475026"
                },
                "ClearanceId": "bb31018c-8ca3-4673-b870-5193cfdf18f5",
                "Playlist": {
                    "AssetKind": 3,
                    "AssetId": "1b1691dc-d8b9-4b1f-825d-cb1c065184c1",
                    "VersionId": "38ecf0d8-82ca-4831-b186-eda51653f2ba"
                },
                "PlaylistExperience": 2,
                "PlaylistMapModePair": {
                    "AssetKind": 7,
                    "AssetId": "6b7c20a9-5eed-476f-9716-6d20e2f37f1a",
                    "VersionId": "56c4ba81-a659-4168-bc02-8f4135e693f9"
                },
                "SeasonId": "Csr/Seasons/CsrSeason8-1.json",
                "PlayableDuration": "PT6M31.063S",
                "TeamsEnabled": true,
                "TeamScoringEnabled": true,
                "GameplayInteraction": 1
            },
            "LastTeamId": 1,
            "Outcome": 2,
            "Rank": 1,
            "PresentAtEndOfMatch": true
        },
        [...MORE MATCH DATA...]
      ]
}

This is all useful metadata, but we are looking specifically for the match ID captured in the MatchId property. In my case, the match I am looking for is 4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4, which is a recent Husky Raid game I’ve been a part of.

With the match ID in hand, we can now request the film chunks (every film has several “chunks” that are just binary data) by constructing the URL for another API endpoint, like this:

https://discovery-infiniteugc.svc.halowaypoint.com
  /hi
  /films
  /matches
  /4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4
  /spectate

If the call succeeds, the metadata you will get will look like this:

{
    "FilmStatusBond": 1,
    "CustomData": {
        "FilmLength": 403190,
        "Chunks": [
            {
                "Index": 0,
                "ChunkStartTimeOffsetMilliseconds": 0,
                "DurationMilliseconds": 11,
                "ChunkSize": 465309,
                "FileRelativePath": "/filmChunk0",
                "ChunkType": 1
            },
            {
                "Index": 1,
                "ChunkStartTimeOffsetMilliseconds": 0,
                "DurationMilliseconds": 19972,
                "ChunkSize": 47858,
                "FileRelativePath": "/filmChunk1",
                "ChunkType": 2
            },
            {
                "Index": 2,
                "ChunkStartTimeOffsetMilliseconds": 19973,
                "DurationMilliseconds": 20003,
                "ChunkSize": 122480,
                "FileRelativePath": "/filmChunk2",
                "ChunkType": 2
            },
            [...MORE CHUNKS...]
        ],
        "HasGameEnded": true,
        "ManifestRefreshSeconds": 30,
        "MatchId": "4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4",
        "FilmMajorVersion": 37
    },
    "BlobStoragePathPrefix": "https://blobs-infiniteugc.svc.halowaypoint.com/ugcstorage/film/1c7442bd-1f8d-4593-b7d0-1c95618c6876/e6796b9c-eb98-4c32-879a-5e5ab3d567f1/",
    "AssetId": "1c7442bd-1f8d-4593-b7d0-1c95618c6876"
}

The way Halo Infinite API handles films is by splitting them up into separate chunks that contain different classes of in-game metadata during different parts of the game. You will see those chunks yourself when you are in theater mode - the timeline is clearly split into them (see the black markers):

Film fragments in Theater mode in Halo Infinite.

Film chunks are player-independent - they are recorded for the match itself and contain metadata about all players in them. To get the content of each chunk we will construct the URL based on the BlobStoragePathPrefix property and the FileRelativePath for each chunk:

https://blobs-infiniteugc.svc.halowaypoint.com
  /ugcstorage
  /film
  /1c7442bd-1f8d-4593-b7d0-1c95618c6876
  /e6796b9c-eb98-4c32-879a-5e5ab3d567f1
  /filmChunk0

While this is not explicitly called out, the first GUID is the film asset ID and the second is the film asset version, similar to how game asset metadata is associated in the game CMS. If you have film IDs, you can get those directly without worrying about getting match IDs first.

With the URLs ready, we can now download every single chunk for a match and analyze them. If you are on Linux (or using Windows Subsystem for Linux) you can use this Bash script to quickly download all film chunks for a match (make sure to replace your token and clearance):

#!/bin/bash

# Check if match ID is provided
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <MATCH_ID>"
    exit 1
fi

MATCH_ID=$1

# Headers for the API request
AUTH_HEADER="x-343-authorization-spartan: v4=YOUR_TOKEN"
CLEARANCE_HEADER="343-clearance: CURRENT_CLEARANCE"
LANGUAGE_HEADER="Accept-Language: en-us"
ACCEPT_HEADER="accept: application/json"

echo "Fetching chunk information for match: ${MATCH_ID}..."
RESPONSE=$(curl --silent --location --request GET "https://discovery-infiniteugc.svc.halowaypoint.com/hi/films/matches/${MATCH_ID}/spectate" \
    --header "${AUTH_HEADER}" \
    --header "${CLEARANCE_HEADER}" \
    --header "${LANGUAGE_HEADER}" \
    --header "${ACCEPT_HEADER}" \
    -w "%{http_code}" -o response.json)

HTTP_STATUS="${RESPONSE}"
echo $HTTP_STATUS

# Check for successful response
if [[ "$HTTP_STATUS" != "200" ]]; then
    echo "Error fetching data: HTTP status $HTTP_STATUS"
    exit 1
fi

# Extract the base URL and film chunk paths
BASE_URL=$(jq -r '.BlobStoragePathPrefix' response.json)
CHUNK_PATHS=$(jq -r '.CustomData.Chunks[].FileRelativePath' response.json | sed 's|^/||')  # Remove leading slashes

# Clean up response file
rm response.json

# Loop through each chunk and download it
for CHUNK_PATH in $CHUNK_PATHS; do
    # Construct the full URL
    FULL_URL="${BASE_URL}${CHUNK_PATH}"
    COMPRESSED_FILE="compressed${CHUNK_PATH##*/}"
    DECOMPRESSED_FILE="DECOMPRESSED_${CHUNK_PATH##*/}"

    # Download the compressed chunk
    echo "Downloading chunk from ${FULL_URL}..."
    curl --location --request GET "${FULL_URL}" \
        --header "${AUTH_HEADER}" \
        --header "${CLEARANCE_HEADER}" \
        --header "${LANGUAGE_HEADER}" \
        --header "${ACCEPT_HEADER}" \
        --output "${COMPRESSED_FILE}"

    # Decompress the chunk
    echo "Decompressing ${COMPRESSED_FILE}..."
    python3 -c "import zlib, sys; sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))" < "${COMPRESSED_FILE}" > "${DECOMPRESSED_FILE}.bin"

    # Clean up compressed file
    rm "${COMPRESSED_FILE}"
    echo "Decompressed chunk saved as ${DECOMPRESSED_FILE}."
done

echo "All chunks downloaded and decompressed!"

You can make the script executable with chmod +x yourscript.sh and then run it by passing the match GUID as the first argument:

./yourscript.sh 1C5F57D3-1418-4BDE-A970-F8FAB6DFE110

This script helpfully decompresses the chunks as well, but we’ll get to that a bit later in this post.

As you look at the metadata for each chunk you will notice that individual chunks have a type. From what I can infer, they break down like this:

Chunk type	Description
`1`	Game bootstrap metadata
`2`	In-game event captures
`3`	Game summary metadata

We’ll be using every single one of them in our explorations.

Dissecting chunk metadata #

Looking at existing chunks, we see that the ones that have the type of 1 or 2 have very sparse event data, at least on the surface. However, they contain valuable information that we will need. To explore the content, let’s download a random chunk for an existing match:

https://blobs-infiniteugc.svc.halowaypoint.com/ugcstorage
  /film
  /1c7442bd-1f8d-4593-b7d0-1c95618c6876
  /e6796b9c-eb98-4c32-879a-5e5ab3d567f1
  /filmChunk3

Opening it in a hex editor produces this result:

Binary content for a Halo Infinite film chunk.

Not exactly “human-readable”, and that’s because we’re missing a core step here - decompression. The clue for that are the first two bytes of the chunk file 78 5E, which is an indicator of zlib Fast Compression. You can read more about it in the official RFC. Looks like we’re dealing with compressed data, and therefore need to make sure that we “extract” it before attempting to read the data.

Let’s do this a bit differently then - we’re going to download the binary file with cURL and then decompress it with Python. Assuming that you are not already using the script I shared earlier to download every chunk, our first step is this:

curl --location --request GET 'https://blobs-infiniteugc.svc.halowaypoint.com/ugcstorage/film/1c7442bd-1f8d-4593-b7d0-1c95618c6876/e6796b9c-eb98-4c32-879a-5e5ab3d567f1/filmChunk3' --header 'x-343-authorization-spartan: v4=YOUR_AUTH_HEADER' --header '343-clearance: YOUR_CLEARANCE' --header 'Accept-Language: en-us' --header 'accept: application/json' --output chunk-compressed.bin

And then, we can run a bit of inline Python magic to decompress the content we just downloaded into its own file - decompressed_output.bin:

python3 -c "import zlib, sys; sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))" < chunk-compressed.bin > decompressed_output.bin

Uncompressed binary content for a Halo Infinite film chunk.

This looks a bit more promising because we actually see repeating patterns. It’s even more promising if we look up events inside the chunk by the XUID for a given player that existed in a match. Because I am using a hex editor, I can easily look up the UInt64 value (all XUIDs are unsigned 64-bit integers), leading me to this:

7:B1E0h  00 00 00 00 00 00 00 00 00 00 00 00 5A 00 65 00  ............Z.e. 
7:B1F0h  42 00 6F 00 6E 00 64 00 00 00 00 00 00 00 00 00  B.o.n.d......... 
7:B200h  00 00 00 00 00 00 00 00 00 00 00 00 00 E5 DE DE  .............åÞÞ 
7:B210h  03 00 00 09 00 2D C0 00 00 00 04 58 00 00 00 00  .....-À....X....

Because Halo Infinite is generally known to use quite a bit of Bond-encoded data, I wanted to pass the content of the file through my tool - bond-reader. Doing that was fruitless, though, as it turned out that the data is not Bond-formatted (at least not that I could tell from some short-term digging). I guess we’ll have to stick with proper inference of binary data based on vanilla binary pattern analysis.

Another wrench thrown into our plans was also detected by Andy Curtis the fact that data is not necessarily byte-aligned in the film chunks. That is - if you use a hex editor to spot all existing patterns you might find some but there is quite a bit of data “hiding” in plain sight because it just isn’t properly positioned for a hex editor to render it.

Decoding unaligned data #

Because we can’t count on just our hex editor to find the data, we can write some custom code to find the things we want that are not aligned with our expectations 😎

To do that, here is a complete C# application that does just that - if you give it a byte pattern to search for (disregard the actual example pattern - it’s just a demo), it will try to find it regardless of how the data is actually aligned in the file:

namespace ComponentSearchByteAlign
{
    internal class Program
    {
        public static void Main(string[] args)
        {
            byte[] data = File.ReadAllBytes(@"PATH_TO_YOUR_DECOMPRESSED_BIN_FILE");

            // This can be a XUID or a gamertag to easily spot the data sequences
            byte[] pattern = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF };

            List<int> matchPositions = FindPattern(data, pattern);

            if (matchPositions.Count > 0)
            {
                Console.WriteLine($"Pattern found at bit positions ({matchPositions.Count} total):");
                foreach (int position in matchPositions)
                {
                    Console.WriteLine(position);
                }
            }
            else
            {
                Console.WriteLine("Pattern not found.");
            }
        }

        public static List<int> FindPattern(byte[] data, byte[] pattern)
        {
            List<int> matchPositions = [];
            int dataBitLength = data.Length * 8;
            int patternBitLength = pattern.Length * 8;

            for (int bitPos = 0; bitPos <= dataBitLength - patternBitLength; bitPos++)
            {
                if (IsBitMatch(data, pattern, bitPos))
                {
                    matchPositions.Add(bitPos);
                }
            }
            return matchPositions;
        }

        public static bool IsBitMatch(byte[] data, byte[] pattern, int bitOffset)
        {
            // Calculates the number of whole bytes to skip.
            // We divide bitOffset by 8 because there are 8 bits per byte.
            int byteOffset = bitOffset / 8;

            // Calculates how far into the byte (number of bits) we need to start.
            // It's the remainder when bitOffset is divided by 8, giving the bit position within the byte.
            int bitShift = bitOffset % 8;

            // On the above, a good example to visualize the behavior:
            // If bitOffset = 10, byteOffset = 1 (skip 1 full byte) and bitShift = 2 (start at the 3rd bit in the second byte - we skip 2).

            // We now iterate through every byte in the pattern that is given to
            // us when the function is called.
            for (int i = 0; i < pattern.Length; i++)
            {
                // Get the data byte that alligns with the current
                // pattern byte and shifts the bits to the left by the
                // calculated bit shift value earlier.
                byte dataByte = (byte)(data[byteOffset + i] << bitShift);

                // If bitShift > 0, include bits from the next byte. This is
                // important for scenarios where, for example, we're shifting
                // by 3 bits, meaning that part of the data will come from the
                // next byte.
                if (byteOffset + i + 1 < data.Length && bitShift > 0)
                {
                    // Shifts the next byte to the right by the delta between 8
                    // and the calcualted bit shift value, aligning it with the
                    // remaining part of the data byte.
                    // Note: bitwise OR (|=) is used to combine the shifted parts
                    // so that we can perform a full byte comparison.
                    dataByte |= (byte)(data[byteOffset + i + 1] >> (8 - bitShift));
                }

                // Compare dataByte with the current byte in the pattern
                if (dataByte != pattern[i])
                {
                    // Not matching at position. No point in
                    // continuing.
                    return false;
                }
            }

            // All bits match
            return true;
        }
    }
}

Running this code will enable us to quickly detect the positions of data sequences that contain relevant information. For example, one of the observations about the film file is that we can spot XUID references by looking at the 0x2D 0xC0 pattern. If we use this pattern and run the tool across a set of film chunks we’ll see quite a few results:

Result of running the segment detection application.

How bit shifting works #

Before we go any further, though, let me explain a bit the “magic” of bit shifting that you might’ve noticed in the program above. Let’s say we have a data array like this:

Byte Index	Hex Value	Binary
0	`0xAB`	`10101011`
1	`0xCD`	`11001101`
2	`0xEF`	`11101111`
3	`0x12`	`00010010`

The pattern we want to look for is this:

Byte Index	Hex Value	Binary
0	`0xCD`	`11001101`
1	`0xEF`	`11101111`

Let’s pick a random bit offset - 10. that means that we’re starting at the 10th bit in the data array. If we look at the IsBitMatch function, it takes the bit offset as an argument.

That means that if we pass 10 as the value, we get a byteOffset of 1, meaning that we skip an entire byte (just one) when looking for the data.

Now, keep in mind that when calculating byteOffset it was not a “clean” division - we have a remainder, that is helpfully captured by bitShift, and that remainder is equal to 2, which means that with the byte at index 1 (remember, we skipped the one at 0), we start with the third bit (skip first two, as bitShift tells us).

That can be visualized in a table like this:

Byte Index	Hex Value	Binary	Comment
0	`0xAB`	`10101011`	We’re skipping this entirely.
1	`0xCD`	`11001101`	We start comparing from the third bit.
2	`0xEF`	`11101111`	We’ll use the data from this bit to make sure we can build a full byte.
3	`0x12`	`00010010`	Used in comparison later.

Now, I mentioned that we start our parsing with the byte at index 1 at the third bit. Look at the binary representation for that byte:

11001101

We skip the first two bits, and shift the bits left, padding the “missing” bits with zeroes at the end:

00110100

Now, instead of using the zeroes, we can steal the two leading bits from the next byte in our sequence (at index 2 - that is, 0xEF). We shift it right by six bits to the right to get the top 2 bits (because that’s all we need to complete it), so that:

11101111

Becomes:

00000011

So now from the shifted bytes we have these two values:

00110100
00000011

Combining them gives us:

00110111

This binary value does not match the first value of our pattern (11001101), so the search will move on from the next offset, and so on.

Digging through the chunks #

So now that we have an idea on how to look for data we can start looking at individual “envelopes” that contain player details. As I mentioned above, there are many chunks that are usually provided for a given film; however, the ones that capture specific events, like deaths, kills, or medal awards, are all aggregated in the last film chunk file, with the ChunkType of 3.

Within the very last chunk (of type 3) the events are usually structured like this:

Header	Gamertag (Unicode)	Padding	Type	Timestamp	Padding	Medal Marker	Padding	Metadata (Medal Type)
12 bytes	32 bytes	15 bytes	1 byte	4 bytes	3 bytes	1 byte	3 bytes	1 byte

Be careful with assuming that a gamertag is unique within a match. There were cases where the same match had a gamertag like MyGamertag and another MsMyGamertag - you can’t search just for MyGamertag as that will produce some unexpected results. You need to check that there are 12 preceding bytes of “header” (arbitrary given that I don’t know what they represent, but consistent for individual gamertags) exist and then the headers before that are 0x00 (I limit to 3 zero bytes). That way you can ensure that you are extracting a properly offset event.

Some matches may not have a chunk of type 3 - that’s very likely a bug in the API. Without this chunk there is no timeline you can parse as easily. Additionally, it’s entirely possible that the chunk of type 3 doesn’t contain gamertag-associated data. Additional investigation is needed to understand that behavior.

If you are using a tool like 010 Editor and extract the binary data on a per-file basis (i.e., find the bit positions for the gamertag start and then extract the bytes into its own file from there), you can use the following extremely basic binary template to highlight the sequences for easier parsing:

struct HEADER
{
    char bytes[12];
};

struct GAMERTAG
{
    char bytes[32];
};

struct TYPE
{
    char bytes[1];
};

struct TIMESTAMP
{
    char bytes[4];
};

struct BUFF_PADDING
{
    char bytes[15];
};

struct PADDING
{
    char bytes[3];
};

struct MEDAL_MARKER
{
    char bytes[1];
};

local int offset = 0;

HEADER header <bgcolor=0x659157>;
offset += sizeof(HEADER);
FSeek(offset);

GAMERTAG gt <bgcolor=cGreen>;
offset += sizeof(GAMERTAG);
FSeek(offset);

BUFF_PADDING bp <bgcolor=cBlue>;
offset += sizeof(BUFF_PADDING);
FSeek(offset);

TYPE type <bgcolor=cYellow>;
offset += sizeof(TYPE);
FSeek(offset);

TIMESTAMP ts <bgcolor=cRed>;
offset += sizeof(TIMESTAMP);
FSeek(offset);

PADDING padding <bgcolor=cBlue>;
offset += sizeof(PADDING);
FSeek(offset);

MEDAL_MARKER mm <bgcolor=0xF7AF9D>;
offset += sizeof(MEDAL_MARKER);
FSeek(offset);

PADDING padding <bgcolor=cBlue>;
offset += sizeof(PADDING);
FSeek(offset);

MEDAL_MARKER mtype <bgcolor=0xFFC0CB>;
offset += sizeof(MEDAL_MARKER);
FSeek(offset);

The structure above is consistent across matches - I’ve extracted thousands of my own games and ran into minimal issues (with the exception of a few stray gamertags).

Extracting timeline metadata #

Out of all the fields above, the most interesting to me is the metadata one. The metadata field (i.e., the medal type) is capturing numeric values that represent medals. The values are different from the medal mapping. There is no clear mapping between those and a human-readable JSON representation, so we need to infer them by looking at medal volume here and correlate with medals earned per match or through a player’s career. Andy Curtis did the heavy lifting on this for some medals in his SPNKr project (a few are pending additional research).

The following medals are currently known:

Medal ID	Medal
0	Double Kill
1	Triple Kill
2	Overkill
3	Killtacular
4	Killtrocity
5	Killamanjaro
6	Killtastrophe
7	Killpocalypse
8	Killionaire
9	Killing Spree
10	Killing Frenzy
11	Running Riot
12	Rampage
13	Perfection
26	Killjoy
27	Nightmare
28	Boogeyman
29	Grim Reaper
30	Demon
31	Flawless Victory
32	Steaktacular
36	Stopped Short
37	Flag Joust
38	Goal Line Stand
39	Necromancer
43	Ace
44	Extermination
45	Sole Survivor
46	Untainted
47	Blight
48	Disease
49	Plague
51	Pestilence
53	Culling
54	Cleansing
55	Purge
56	Purification
57	Divine Intervention
58	Zombie Slayer
59	Undead Hunter
60	Hell’s Janitor
61	The Sickness
62	Spotter
63	Treasure Hunter
64	Saboteur
65	Wingman
66	Wheelman
67	Gunner
68	Driver
69	Pilot
70	Tanker
71	Rifleman
72	Bomber
73	Grenadier
74	Boxer
75	Warrior
76	Gunslinger
77	Scattergunner
78	Sharpshooter
79	Marksman
80	Heavy
81	Bodyguard
82	Back Smack
83	Nuclear Football
84	Boom Block
85	Bulltrue
86	Cluster Luck
87	Dogfight
88	Harpoon
89	Mind the Gap
90	Ninja
91	Odin’s Raven
92	Pancake
93	Quigley
94	Remote Detonation
95	Return to Sender
96	Rideshare
97	Skyjack
98	Stick
99	Tag & Bag
100	Whiplash
101	Kong
102	Autopilot Engaged
103	Sneak King
104	Windshield Wiper
105	Reversal
106	Hail Mary
107	Nade Shot
108	Snipe
109	Perfect
110	Bank Shot
111	Fire & Forget
112	Ballista
113	Pull
114	No Scope
115	Achilles Spine
116	Grand Slam
117	Guardian Angel
118	Interlinked
119	Death Race
120	Chain Reaction
121	360
122	Combat Evolved
123	Deadly Catch
124	Driveby
125	Fastball
126	Flyin’ High
127	From the Grave
128	From the Void
129	Grapple-jack
130	Hold This
131	Last Shot
132	Lawnmower
133	Mount Up
134	Off the Rack
135	Quick Draw
137	Pineapple Express
138	Ramming Speed
139	Reclaimer
140	Shot Caller
141	Yard Sale
142	Special Delivery
146	Fumble
148	Straight Balling
151	Always Rotating
152	Hill Guardian
153	Clock Stop
154	Secure Line
156	Splatter
162	All That Juice
163	Great Journey
165	Breacher
166	Mounted & Loaded
167	Monopoly
168	Counter-snipe
174	Driving Spree
175	Death Cabbie
176	Immortal Chauffeur
177	Blind Fire
178	Hang Up
179	Call Blocked
180	Clear Reception

The event type, also captured in the envelope, can be one of the following:

Type (Decimal)	Description
`10`	Mode-specific events (e.g., captured the flag, killed the carrier, stole the flag)
`20`	Death
`50`	Kill

Any other type identifier (such as 51, 100, or 250) that you may see here, when associated with a medal, is representative of the medal sorting weight. It maps 1:1 to the information that you can get from the medal metadata endpoint.

Timestamp data is represented in milliseconds from the start of the match. You can obtain a readable value with a C# snippet like this:

Array.Reverse(timestampBytes);
var timestamp = BitConverter.ToUInt32(timestampBytes, 0);

One thing that I haven’t yet figured out is how assists are tracked within the event batch. It’s likely captured as a XUID reference further in the event envelope that I didn’t get to. This will be a topic for another blog post in the future as we dig more through the film file format.

Finding the gamertags #

Notice that to extract all events from the last chunk one specific thing is still needed - we need to start with knowing the gamertags for which the events should be extracted. And because gamertags are technically arbitrary text, we need to find an index somewhere. To do that, we can look inside all other chunks (other than ones of type 3). That’s right, for us to get the list of gamertags that were involved in a given game we need to download and parse all existing film chunks other than the very last one that has ChunkType set to 3.

The last chunk contains information on all players in the game but doesn’t seem to contain a very clear XUID and Gamertag combination that will allow us to extract them cleanly. Luckily, inside all other chunks (where ChunkType is either 1 or 2), the gamertags and XUIDs can be found by looking at the pattern: 0x2D 0xC0. From that pattern, we can deduce the following structure:

Gamertag (Unicode)	Padding	XUID	Marker 1	Marker 2
Dynamic length (32 bytes max)	21 bytes	8 bytes	`0x2D`	`0xC0`

Keep in mind that gamertags are stored as Unicode (UTF-16) text. This means that the padding can be deceiving if you are looking at the binary file - you might think that there are 22 0x00 bytes before the gamertag value, when in fact the last zero byte is just the trailing byte for the gamertag text. Make sure to be careful when parsing the values.

We can scan all film chunks for this pattern by identifying the markers, getting the XUID, checking that the preceding 21 bytes are 0x00 (padding zero bytes), and then grab 32 bytes of the gamertag data that can be parsed as a Unicode string. There are more safeguards we can put in place for this logic, but ultimately it’s good enough to extract the basic data.

Once the data is extracted into, say, a dictionary, we can use that as a starting point to look up gamertags in the final (summary) chunk.

As I mentioned earlier, depending on the matches that you are getting, some of them might not have a chunk with ChunkType equal to 3. Others can return HTTP 404 (blob does not exist) errors when attempting to download a chunk. The former may be a bug. The latter is likely caused by the folks at 343 occasionally cleaning up the storage from older matches.

In C#, the extraction logic can be formalized as such:

public static byte[] ExtractBitsFromPosition(byte[] data, int startBitPosition, int bitLength)
{
    // Calculate the actual end bit position
    int endBitPosition = startBitPosition + bitLength - 1;

    // Validate input parameters
    if (startBitPosition < 0 || endBitPosition < 0 || startBitPosition >= data.Length * 8 || endBitPosition >= data.Length * 8 || startBitPosition > endBitPosition)
    {
        throw new ArgumentOutOfRangeException("Bit positions are out of range or invalid.");
    }

    // Calculate the byte offset and bit shift for the start position
    int startByteOffset = startBitPosition / 8;
    int startBitShift = startBitPosition % 8;

    // Calculate the byte offset and bit shift for the end position
    int endByteOffset = endBitPosition / 8;
    int endBitShift = endBitPosition % 8;

    // Calculate the number of bytes to extract
    int byteCount = endByteOffset - startByteOffset + 1;

    // If there's no bit shift, we can return from the byte offset onward
    if (startBitShift == 0 && endBitShift == 0)
    {
        byte[] result = new byte[byteCount];
        Array.Copy(data, startByteOffset, result, 0, byteCount);
        return result;
    }

    // Otherwise, we need to shift the bits manually
    byte[] extractedData = new byte[byteCount];

    // Go byte by byte, shift and copy
    for (int i = 0; i < byteCount - 1; i++)
    {
        // Shift the current byte and take bits from the next byte if needed
        extractedData[i] = (byte)((data[startByteOffset + i] << startBitShift) | (data[startByteOffset + i + 1] >> (8 - startBitShift)));
    }

    // Handle the last byte (since it has no next byte to pull from)
    extractedData[byteCount - 1] = (byte)(data[startByteOffset + byteCount - 1] << startBitShift);

    // Mask the last byte to only include bits up to endBitShift
    extractedData[byteCount - 1] &= (byte)(0xFF >> (7 - endBitShift));

    return extractedData;
}

Recall that the data may or may not be byte-aligned so we need to operate on individual bits. In turn, once we find the marker pattern in film segment chunks (as we try to spot the gamertag and XUID combos), we can extract it with a function like this (where pattern is set to 0x2D 0xC0):

public static void ProcessData(byte[] data, byte[] pattern)
{
    List<int> patternPositions = FindPattern(data, pattern);

    foreach (int patternPosition in patternPositions)
    {
        int xuidStartPosition = patternPosition - 8 * 8;
        byte[] xuid = ExtractBitsFromPosition(data, xuidStartPosition, 8*8);
        var convertedXuid = ConvertBytesToInt64(xuid);

        if (convertedXuid != 0)
        {
            int prePatternPosition = xuidStartPosition - 21 * 8;
            var bytePrefixValidated = AreAllBytesZero(data, prePatternPosition, 21 * 8);

            if (bytePrefixValidated)
            {
                Console.WriteLine($"XUID: {convertedXuid}");
                byte[] undefinedData = ExtractBitsFromPosition(data, prePatternPosition - 32 * 8, 32 * 8);
                Console.WriteLine($"Undefined Data (until 0x00 0x00): {ConvertBytesToText(undefinedData)}");
            }
        }
    }

    Console.ReadLine();
}

To simplify how I extract the data, I built a tool called OpenSpartan/film-event-extractor which will let you log in with your Xbox Live ID and aggregate all match data within a local SQLite database. The entire parsing logic is very much in flux (feel free to follow the discussion on this), but once it stabilizes I can see integrating this better in OpenSpartan Workshop.

For my own account, having played more than seven thousand matches, the entire aggregation took around 48 hours. I haven’t yet optimized (and parallelized) the code, so this can be attributed to also me building a slower-than-needed tool, but it works for now and I can start analyzing the data.

The data that is available through the API is mostly good as-is, but an expanded dataset that accounts for film-based details enables me to see two things more clearly:

Mapping between gamertags and XUIDs at the time of the match (gamertags are mutable as users can change them, XUIDs are immutable). This way I don’t need to worry about doing out-of-band conversion to get an understanding of who I played against, since the match details API only returns XUIDs.
Times when specific events occur in-game. I can see how quickly I earn the first medal in the game, or how quickly I get to the first kill or death.

What’s next #

There are are a few improvements that I want to make to both the open-source tool that I built as well as to my understanding of the film files. I alluded to assists earlier - that’s a data point that I definitely want to cover. Additionally, film files may contain the data required for us to build heatmaps of map movement. For that, we need to better try and replicate behaviors in the game - that is, understand how binary data changes with movement, weapon switches, use of grenades, and so on. Something tells me it will be a much more protracted project than I initially anticipated 🤔